평가 : https://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning/overview/evaluation
Timeline :
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
path = os.getcwd()
print(path)
C:\Users\toto\Documents\Github\KaggleDataAnalysis\kaggle_1
districts_info = pd.read_csv("../../data/learnplatform-covid19-impact/districts_info.csv")
products_info = pd.read_csv("../../data/learnplatform-covid19-impact/products_info.csv")
districts_info.shape, products_info.shape
((233, 7), (372, 6))
# engagement_data 폴더의 파일 확인
os.listdir("../../data/learnplatform-covid19-impact/engagement_data")
['1000.csv', '1039.csv', '1044.csv', '1052.csv', '1131.csv', '1142.csv', '1179.csv', '1204.csv', '1270.csv', '1324.csv', '1444.csv', '1450.csv', '1470.csv', '1536.csv', '1549.csv', '1558.csv', '1570.csv', '1584.csv', '1624.csv', '1705.csv', '1712.csv', '1742.csv', '1772.csv', '1791.csv', '1857.csv', '1877.csv', '1904.csv', '1965.csv', '2017.csv', '2060.csv', '2074.csv', '2106.csv', '2130.csv', '2165.csv', '2167.csv', '2172.csv', '2201.csv', '2209.csv', '2238.csv', '2257.csv', '2285.csv', '2321.csv', '2339.csv', '2393.csv', '2439.csv', '2441.csv', '2517.csv', '2549.csv', '2567.csv', '2598.csv', '2601.csv', '2685.csv', '2729.csv', '2779.csv', '2870.csv', '2872.csv', '2940.csv', '2956.csv', '2991.csv', '3080.csv', '3160.csv', '3188.csv', '3222.csv', '3228.csv', '3248.csv', '3266.csv', '3301.csv', '3314.csv', '3322.csv', '3371.csv', '3390.csv', '3393.csv', '3412.csv', '3471.csv', '3550.csv', '3558.csv', '3580.csv', '3640.csv', '3668.csv', '3670.csv', '3692.csv', '3710.csv', '3732.csv', '3772.csv', '3864.csv', '3936.csv', '3959.csv', '3986.csv', '4029.csv', '4031.csv', '4051.csv', '4083.csv', '4165.csv', '4183.csv', '4203.csv', '4314.csv', '4348.csv', '4373.csv', '4408.csv', '4516.csv', '4520.csv', '4550.csv', '4569.csv', '4591.csv', '4602.csv', '4629.csv', '4666.csv', '4668.csv', '4683.csv', '4744.csv', '4749.csv', '4775.csv', '4808.csv', '4921.csv', '4929.csv', '4936.csv', '4937.csv', '4949.csv', '5006.csv', '5042.csv', '5057.csv', '5150.csv', '5231.csv', '5257.csv', '5380.csv', '5404.csv', '5422.csv', '5479.csv', '5510.csv', '5524.csv', '5527.csv', '5600.csv', '5604.csv', '5627.csv', '5802.csv', '5882.csv', '5890.csv', '5903.csv', '5934.csv', '5970.csv', '5987.csv', '6046.csv', '6049.csv', '6055.csv', '6066.csv', '6104.csv', '6131.csv', '6144.csv', '6165.csv', '6194.csv', '6250.csv', '6345.csv', '6418.csv', '6512.csv', '6577.csv', '6584.csv', '6640.csv', '6665.csv', '6721.csv', '6762.csv', '6774.csv', '6919.csv', '6998.csv', '7086.csv', '7164.csv', '7177.csv', '7305.csv', '7308.csv', '7342.csv', '7352.csv', '7387.csv', '7457.csv', '7541.csv', '7614.csv', '7660.csv', '7675.csv', '7723.csv', '7741.csv', '7752.csv', '7767.csv', '7785.csv', '7798.csv', '7829.csv', '7858.csv', '7964.csv', '7970.csv', '7975.csv', '7980.csv', '8017.csv', '8076.csv', '8103.csv', '8127.csv', '8160.csv', '8184.csv', '8256.csv', '8328.csv', '8425.csv', '8433.csv', '8515.csv', '8520.csv', '8539.csv', '8556.csv', '8685.csv', '8702.csv', '8723.csv', '8748.csv', '8784.csv', '8796.csv', '8815.csv', '8845.csv', '8884.csv', '8902.csv', '8937.csv', '9007.csv', '9043.csv', '9120.csv', '9140.csv', '9230.csv', '9303.csv', '9357.csv', '9463.csv', '9478.csv', '9515.csv', '9536.csv', '9537.csv', '9553.csv', '9589.csv', '9729.csv', '9778.csv', '9812.csv', '9839.csv', '9899.csv', '9927.csv']
districts_info.head()
| district_id | state | locale | pct_black/hispanic | pct_free/reduced | county_connections_ratio | pp_total_raw | |
|---|---|---|---|---|---|---|---|
| 0 | 8815 | Illinois | Suburb | [0, 0.2[ | [0, 0.2[ | [0.18, 1[ | [14000, 16000[ |
| 1 | 2685 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 4921 | Utah | Suburb | [0, 0.2[ | [0.2, 0.4[ | [0.18, 1[ | [6000, 8000[ |
| 3 | 3188 | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2238 | NaN | NaN | NaN | NaN | NaN | NaN |
products_info.head()
| LP ID | URL | Product Name | Provider/Company Name | Sector(s) | Primary Essential Function | |
|---|---|---|---|---|---|---|
| 0 | 13117 | https://www.splashmath.com | SplashLearn | StudyPad Inc. | PreK-12 | LC - Digital Learning Platforms |
| 1 | 66933 | https://abcmouse.com | ABCmouse.com | Age of Learning, Inc | PreK-12 | LC - Digital Learning Platforms |
| 2 | 50479 | https://www.abcya.com | ABCya! | ABCya.com, LLC | PreK-12 | LC - Sites, Resources & Reference - Games & Si... |
| 3 | 92993 | http://www.aleks.com/ | ALEKS | McGraw-Hill PreK-12 | PreK-12; Higher Ed | LC - Digital Learning Platforms |
| 4 | 73104 | https://www.achieve3000.com/ | Achieve3000 | Achieve3000 | PreK-12 | LC - Digital Learning Platforms |
# engagement_data 폴더의 파일 확인
list1 = os.listdir("../../data/learnplatform-covid19-impact/engagement_data")
list1[0:10]
['1000.csv', '1039.csv', '1044.csv', '1052.csv', '1131.csv', '1142.csv', '1179.csv', '1204.csv', '1270.csv', '1324.csv']
dis_info_1000 = pd.read_csv("../../data/learnplatform-covid19-impact/engagement_data/1000.csv")
dis_info_1000.head()
| time | lp_id | pct_access | engagement_index | |
|---|---|---|---|---|
| 0 | 2020-01-01 | 93690.0 | 0.00 | NaN |
| 1 | 2020-01-01 | 17941.0 | 0.03 | 0.90 |
| 2 | 2020-01-01 | 65358.0 | 0.03 | 1.20 |
| 3 | 2020-01-01 | 98265.0 | 0.57 | 37.79 |
| 4 | 2020-01-01 | 59257.0 | 0.00 | NaN |
Dropping Districts with NaN States
print(districts_info.shape)
districts_info = districts_info[districts_info.state.notna()].reset_index(drop=True)
print(districts_info.shape)
(233, 7) (176, 7)
One-Hot Encoding the Product Sectors
products_info['Sector(s)'].unique()
array(['PreK-12', 'PreK-12; Higher Ed', 'PreK-12; Higher Ed; Corporate',
nan, 'Corporate', 'Higher Ed; Corporate'], dtype=object)
temp_sectors = products_info['Sector(s)'].str.get_dummies(sep="; ")
temp_sectors.head()
| Corporate | Higher Ed | PreK-12 | |
|---|---|---|---|
| 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 |
| 2 | 0 | 0 | 1 |
| 3 | 0 | 1 | 1 |
| 4 | 0 | 0 | 1 |
temp_sectors.columns = [f"sector_{re.sub(' ', '', c)}" for c in temp_sectors.columns]
temp_sectors.columns
Index(['sector_Corporate', 'sector_HigherEd', 'sector_PreK-12'], dtype='object')
products_info = products_info.join(temp_sectors)
products_info.head()
| LP ID | URL | Product Name | Provider/Company Name | Sector(s) | Primary Essential Function | sector_Corporate | sector_HigherEd | sector_PreK-12 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 13117 | https://www.splashmath.com | SplashLearn | StudyPad Inc. | PreK-12 | LC - Digital Learning Platforms | 0 | 0 | 1 |
| 1 | 66933 | https://abcmouse.com | ABCmouse.com | Age of Learning, Inc | PreK-12 | LC - Digital Learning Platforms | 0 | 0 | 1 |
| 2 | 50479 | https://www.abcya.com | ABCya! | ABCya.com, LLC | PreK-12 | LC - Sites, Resources & Reference - Games & Si... | 0 | 0 | 1 |
| 3 | 92993 | http://www.aleks.com/ | ALEKS | McGraw-Hill PreK-12 | PreK-12; Higher Ed | LC - Digital Learning Platforms | 0 | 1 | 1 |
| 4 | 73104 | https://www.achieve3000.com/ | Achieve3000 | Achieve3000 | PreK-12 | LC - Digital Learning Platforms | 0 | 0 | 1 |
products_info.drop("Sector(s)", axis=1, inplace=True)
print(products_info.columns)
del temp_sectors
Index(['LP ID', 'URL', 'Product Name', 'Provider/Company Name',
'Primary Essential Function', 'sector_Corporate', 'sector_HigherEd',
'sector_PreK-12'],
dtype='object')
products_info.head()
| LP ID | URL | Product Name | Provider/Company Name | Primary Essential Function | sector_Corporate | sector_HigherEd | sector_PreK-12 | |
|---|---|---|---|---|---|---|---|---|
| 0 | 13117 | https://www.splashmath.com | SplashLearn | StudyPad Inc. | LC - Digital Learning Platforms | 0 | 0 | 1 |
| 1 | 66933 | https://abcmouse.com | ABCmouse.com | Age of Learning, Inc | LC - Digital Learning Platforms | 0 | 0 | 1 |
| 2 | 50479 | https://www.abcya.com | ABCya! | ABCya.com, LLC | LC - Sites, Resources & Reference - Games & Si... | 0 | 0 | 1 |
| 3 | 92993 | http://www.aleks.com/ | ALEKS | McGraw-Hill PreK-12 | LC - Digital Learning Platforms | 0 | 1 | 1 |
| 4 | 73104 | https://www.achieve3000.com/ | Achieve3000 | Achieve3000 | LC - Digital Learning Platforms | 0 | 0 | 1 |
'Primary Essential Function' 컬럼을 기본 및 하위 범주로 분할
products_info['pri_function_main'] = products_info['Primary Essential Function'].apply(lambda x: x.split(' - ')[0] if x == x else x)
products_info['pri_function_sub'] = products_info['Primary Essential Function'].apply(lambda x: x.split(' - ')[1] if x == x else x)
products_info.head()
| LP ID | URL | Product Name | Provider/Company Name | Primary Essential Function | sector_Corporate | sector_HigherEd | sector_PreK-12 | pri_function_main | pri_function_sub | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 13117 | https://www.splashmath.com | SplashLearn | StudyPad Inc. | LC - Digital Learning Platforms | 0 | 0 | 1 | LC | Digital Learning Platforms |
| 1 | 66933 | https://abcmouse.com | ABCmouse.com | Age of Learning, Inc | LC - Digital Learning Platforms | 0 | 0 | 1 | LC | Digital Learning Platforms |
| 2 | 50479 | https://www.abcya.com | ABCya! | ABCya.com, LLC | LC - Sites, Resources & Reference - Games & Si... | 0 | 0 | 1 | LC | Sites, Resources & Reference |
| 3 | 92993 | http://www.aleks.com/ | ALEKS | McGraw-Hill PreK-12 | LC - Digital Learning Platforms | 0 | 1 | 1 | LC | Digital Learning Platforms |
| 4 | 73104 | https://www.achieve3000.com/ | Achieve3000 | Achieve3000 | LC - Digital Learning Platforms | 0 | 0 | 1 | LC | Digital Learning Platforms |
products_info['pri_function_sub'].unique()
array(['Digital Learning Platforms', 'Sites, Resources & Reference',
'Courseware & Textbooks', 'Study Tools', 'Teacher Resources',
'Learning Management Systems (LMS)', 'Content Creation & Curation',
'Online Course Providers & Technical Skills Development',
'Classroom Engagement & Instruction', 'School Management Software',
'Other', 'Data, Analytics & Reporting', 'Virtual Classroom', nan,
'Career Planning & Job Search', 'Human Resources',
'Large-Scale & Standardized Testing',
'Sites, Resources & References',
'Admissions, Enrollment & Rostering',
'Environmental, Health & Safety (EHS) Compliance'], dtype=object)
# Synchronize similar values
products_info['pri_function_sub'] = products_info['pri_function_sub'].replace(
{'Sites, Resources & References' : 'Sites, Resources & Reference'})
products_info.drop("Primary Essential Function", axis=1, inplace=True)
products_info['pri_function_sub'].unique()
array(['Digital Learning Platforms', 'Sites, Resources & Reference',
'Courseware & Textbooks', 'Study Tools', 'Teacher Resources',
'Learning Management Systems (LMS)', 'Content Creation & Curation',
'Online Course Providers & Technical Skills Development',
'Classroom Engagement & Instruction', 'School Management Software',
'Other', 'Data, Analytics & Reporting', 'Virtual Classroom', nan,
'Career Planning & Job Search', 'Human Resources',
'Large-Scale & Standardized Testing',
'Admissions, Enrollment & Rostering',
'Environmental, Health & Safety (EHS) Compliance'], dtype=object)
products_info[ ['sector_Corporate', 'sector_HigherEd', 'sector_PreK-12',
'pri_function_main', 'pri_function_sub'] ]
| sector_Corporate | sector_HigherEd | sector_PreK-12 | pri_function_main | pri_function_sub | |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 1 | LC | Digital Learning Platforms |
| 1 | 0 | 0 | 1 | LC | Digital Learning Platforms |
| 2 | 0 | 0 | 1 | LC | Sites, Resources & Reference |
| 3 | 0 | 1 | 1 | LC | Digital Learning Platforms |
| 4 | 0 | 0 | 1 | LC | Digital Learning Platforms |
| ... | ... | ... | ... | ... | ... |
| 367 | 1 | 1 | 1 | SDO | Other |
| 368 | 1 | 1 | 1 | LC | Content Creation & Curation |
| 369 | 0 | 1 | 1 | LC | Sites, Resources & Reference |
| 370 | 0 | 0 | 0 | NaN | NaN |
| 371 | 0 | 0 | 0 | NaN | NaN |
372 rows × 5 columns
districts_info.district_id.unique()
array([8815, 4921, 5987, 3710, 7177, 9812, 6584, 1044, 7457, 1904, 5527,
2257, 7614, 4808, 1877, 2779, 8328, 8539, 9043, 1549, 4051, 7305,
2167, 6577, 4602, 4936, 4520, 7785, 3668, 7970, 5231, 9589, 8433,
2165, 2074, 1142, 7964, 8784, 7798, 3550, 1444, 2601, 7660, 9899,
1742, 4629, 4569, 4949, 6250, 8425, 6418, 1558, 3222, 1772, 5604,
9007, 8884, 1712, 3412, 2940, 5042, 3692, 4683, 2567, 2321, 7767,
7308, 5006, 9140, 8902, 5890, 4031, 6640, 6194, 3864, 2598, 5600,
2991, 2106, 6919, 7980, 2060, 7387, 1000, 5150, 2956, 9553, 1536,
8937, 1791, 4516, 2872, 2439, 8520, 2130, 3772, 4775, 9778, 5524,
1470, 5802, 1324, 3160, 2393, 9230, 3248, 8556, 5627, 4550, 7752,
2729, 4348, 3986, 9537, 1052, 6762, 3670, 1204, 2870, 3558, 1450,
3080, 2517, 1570, 4668, 6055, 2285, 2172, 7741, 6998, 3322, 4083,
3936, 7675, 4744, 9478, 7541, 1270, 8076, 6345, 4183, 9357, 5510,
6104, 3228, 5422, 8127, 3640, 8256, 1857, 5479, 3314, 8748, 4373,
7342, 6046, 7723, 5934, 9927, 2441, 6144, 4314, 9536, 6512, 3732,
2201, 9303, 3266, 1965, 5882, 1705, 9515, 8103, 4929, 7975, 7164],
dtype=int64)
PATH = '../../data/learnplatform-covid19-impact/engagement_data'
temp = []
for district in districts_info.district_id.unique():
df = pd.read_csv(f'{PATH}/{district}.csv', index_col=None, header=0)
df['district_id'] = district
temp.append(df)
len(temp)
176
temp[0:1]
[ time lp_id pct_access engagement_index district_id 0 2020-01-27 32213 100.00 3000.00 8815 1 2020-02-25 90153 33.33 2666.67 8815 2 2020-02-25 99916 0.00 NaN 8815 3 2020-02-25 28504 0.00 NaN 8815 4 2020-02-25 95731 33.33 333.33 8815 ... ... ... ... ... ... 134921 2020-12-31 98468 0.07 1.04 8815 134922 2020-12-31 99984 0.00 NaN 8815 134923 2020-12-31 90014 0.00 NaN 8815 134924 2020-12-31 43876 0.00 NaN 8815 134925 2020-12-31 57084 0.50 37.95 8815 [134926 rows x 5 columns]]
engagement = pd.concat(temp)
engagement = engagement.reset_index(drop=True)
engagement.head()
| time | lp_id | pct_access | engagement_index | district_id | |
|---|---|---|---|---|---|
| 0 | 2020-01-27 | 32213.0 | 100.00 | 3000.00 | 8815 |
| 1 | 2020-02-25 | 90153.0 | 33.33 | 2666.67 | 8815 |
| 2 | 2020-02-25 | 99916.0 | 0.00 | NaN | 8815 |
| 3 | 2020-02-25 | 28504.0 | 0.00 | NaN | 8815 |
| 4 | 2020-02-25 | 95731.0 | 33.33 | 333.33 | 8815 |
engagement.shape
(17435744, 5)
len(engagement.district_id.unique())
176
engagement.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 17435744 entries, 0 to 17435743 Data columns (total 5 columns): # Column Dtype --- ------ ----- 0 time object 1 lp_id float64 2 pct_access float64 3 engagement_index float64 4 district_id int64 dtypes: float64(3), int64(1), object(1) memory usage: 665.1+ MB
engagement[engagement['district_id']==3670].time.unique()
array(['2020-02-15', '2020-02-16', '2020-02-17', '2020-02-18',
'2020-02-19', '2020-02-20', '2020-02-21', '2020-02-22',
'2020-02-23', '2020-02-24', '2020-02-25', '2020-02-26',
'2020-02-27', '2020-02-28', '2020-03-02'], dtype=object)
engagement[engagement['district_id']==2872].time.unique()
array(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12',
'2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16',
'2020-01-17', '2020-01-18', '2020-01-19', '2020-01-20',
'2020-01-21', '2020-01-22', '2020-01-23', '2020-01-24',
'2020-01-25', '2020-01-26', '2020-01-27', '2020-01-28',
'2020-01-29', '2020-01-30', '2020-01-31', '2020-02-04',
'2020-03-04'], dtype=object)
fig, ax = plt.subplots(1, 1, figsize=(8,4))
sns.histplot(engagement.groupby('district_id').time.nunique(), bins=30)
ax.set_title('Unique Days of Engagement Data per District')
plt.show()
# 앞에서 확인한 engagement를 지우고, 새롭게 만든다.
del engagement
temp = []
for district in districts_info.district_id.unique():
df = pd.read_csv(f'{PATH}/{district}.csv', index_col=None, header=0)
df["district_id"] = district
if df.time.nunique() == 366:
temp.append(df)
engagement = pd.concat(temp)
engagement = engagement.reset_index(drop=True)
districts_info.shape, products_info.shape
((176, 7), (372, 9))
# 전체 2020년이 았는 데이터만 합친다.
districts_info = districts_info[districts_info.district_id.isin(engagement.district_id.unique())].reset_index(drop=True)
products_info = products_info[products_info['LP ID'].isin(engagement.lp_id.unique())].reset_index(drop=True)
districts_info.shape, products_info.shape
((176, 7), (369, 9))
engagement.time = engagement.time.astype('datetime64[ns]')
us_state_abbrev = {
'Alabama': 'AL',
'Alaska': 'AK',
'American Samoa': 'AS',
'Arizona': 'AZ',
'Arkansas': 'AR',
'California': 'CA',
'Colorado': 'CO',
'Connecticut': 'CT',
'Delaware': 'DE',
'District Of Columbia': 'DC',
'Florida': 'FL',
'Georgia': 'GA',
'Guam': 'GU',
'Hawaii': 'HI',
'Idaho': 'ID',
'Illinois': 'IL',
'Indiana': 'IN',
'Iowa': 'IA',
'Kansas': 'KS',
'Kentucky': 'KY',
'Louisiana': 'LA',
'Maine': 'ME',
'Maryland': 'MD',
'Massachusetts': 'MA',
'Michigan': 'MI',
'Minnesota': 'MN',
'Mississippi': 'MS',
'Missouri': 'MO',
'Montana': 'MT',
'Nebraska': 'NE',
'Nevada': 'NV',
'New Hampshire': 'NH',
'New Jersey': 'NJ',
'New Mexico': 'NM',
'New York': 'NY',
'North Carolina': 'NC',
'North Dakota': 'ND',
'Northern Mariana Islands':'MP',
'Ohio': 'OH',
'Oklahoma': 'OK',
'Oregon': 'OR',
'Pennsylvania': 'PA',
'Puerto Rico': 'PR',
'Rhode Island': 'RI',
'South Carolina': 'SC',
'South Dakota': 'SD',
'Tennessee': 'TN',
'Texas': 'TX',
'Utah': 'UT',
'Vermont': 'VT',
'Virgin Islands': 'VI',
'Virginia': 'VA',
'Washington': 'WA',
'West Virginia': 'WV',
'Wisconsin': 'WI',
'Wyoming': 'WY'
}
districts_info['state_abbrev'] = districts_info['state'].replace(us_state_abbrev)
districts_info_by_state = districts_info['state_abbrev'].value_counts().to_frame().reset_index(drop=False)
districts_info_by_state.head()
| index | state_abbrev | |
|---|---|---|
| 0 | CT | 30 |
| 1 | UT | 29 |
| 2 | MA | 21 |
| 3 | IL | 18 |
| 4 | CA | 12 |
districts_info_by_state.columns = ['state_abbrev', 'num_districts']
fig = go.Figure()
layout = dict(
title_text = "Number of Available School Districts per State",
geo_scope='usa',
)
fig.add_trace(
go.Choropleth(
locations=districts_info_by_state.state_abbrev,
zmax=1,
z = districts_info_by_state.num_districts,
locationmode = 'USA-states', # set of locations match entries in `locations`
marker_line_color='white',
geo='geo',
colorscale=px.colors.sequential.Teal,
)
)
fig.update_layout(layout)
fig.show()
fig, ax = plt.subplots(1, 2, figsize=(16,4))
sns.countplot(data=products_info, x='pri_function_main', palette ='GnBu', ax=ax[0])
ax[0].set_title('Main Categories in Primary Functions')
sns.countplot(data=products_info[products_info.pri_function_main == 'LC'],
x='pri_function_sub', palette ='GnBu', ax=ax[1])
ax[1].set_title('Sub-Categories in Primary Function LC')
ax[1].set_xticklabels(ax[1].get_xticklabels(), rotation=90)
plt.show()
virtual_classroom_lp_id = products_info[
products_info.pri_function_sub == 'Virtual Classroom']['LP ID'].unique()
# Remove weekends from the dataframe
engagement['weekday'] = pd.DatetimeIndex(engagement['time']).weekday
engagement_without_weekends = engagement[engagement.weekday < 5]
# Figure 1
f, ax = plt.subplots(nrows=1, ncols=1, figsize=(24, 6))
for virtual_classroom_product in virtual_classroom_lp_id:
temp = engagement_without_weekends[
engagement_without_weekends.lp_id == virtual_classroom_product
].groupby('time').pct_access.mean().to_frame().reset_index(drop=False)
sns.lineplot(x=temp.time, y=temp.pct_access,
label=products_info[
products_info['LP ID'] == virtual_classroom_product]['Product Name'].values[0])
plt.legend()
plt.show()
# Figure 2
f, ax = plt.subplots(nrows=1, ncols=1, figsize=(24, 6))
for virtual_classroom_product in virtual_classroom_lp_id:
temp = engagement_without_weekends[
engagement_without_weekends.lp_id == virtual_classroom_product
].groupby('time').engagement_index.mean().to_frame().reset_index(drop=False)
sns.lineplot(x=temp.time,
y=temp.engagement_index,
label=products_info[
products_info['LP ID'] == virtual_classroom_product]['Product Name'].values[0])
plt.legend()
plt.show()
products_info.head()
| LP ID | URL | Product Name | Provider/Company Name | sector_Corporate | sector_HigherEd | sector_PreK-12 | pri_function_main | pri_function_sub | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 13117 | https://www.splashmath.com | SplashLearn | StudyPad Inc. | 0 | 0 | 1 | LC | Digital Learning Platforms |
| 1 | 66933 | https://abcmouse.com | ABCmouse.com | Age of Learning, Inc | 0 | 0 | 1 | LC | Digital Learning Platforms |
| 2 | 50479 | https://www.abcya.com | ABCya! | ABCya.com, LLC | 0 | 0 | 1 | LC | Sites, Resources & Reference |
| 3 | 92993 | http://www.aleks.com/ | ALEKS | McGraw-Hill PreK-12 | 0 | 1 | 1 | LC | Digital Learning Platforms |
| 4 | 73104 | https://www.achieve3000.com/ | Achieve3000 | Achieve3000 | 0 | 0 | 1 | LC | Digital Learning Platforms |
display(products_info.sum())
display(products_info.groupby('pri_function_main')['pri_function_sub'].value_counts().to_frame())
LP ID 20136352 URL https://www.splashmath.comhttps://abcmouse.com... Product Name SplashLearnABCmouse.comABCya!ALEKSAchieve3000A... Provider/Company Name StudyPad Inc.Age of Learning, Inc ABCya.com, L... sector_Corporate 115 sector_HigherEd 179 sector_PreK-12 348 dtype: object
| pri_function_sub | ||
|---|---|---|
| pri_function_main | pri_function_sub | |
| CM | Classroom Engagement & Instruction | 20 |
| Teacher Resources | 7 | |
| Virtual Classroom | 7 | |
| LC | Sites, Resources & Reference | 101 |
| Digital Learning Platforms | 74 | |
| Content Creation & Curation | 35 | |
| Study Tools | 35 | |
| Courseware & Textbooks | 18 | |
| Online Course Providers & Technical Skills Development | 5 | |
| Career Planning & Job Search | 3 | |
| LC/CM/SDO | Other | 16 |
| SDO | Data, Analytics & Reporting | 11 |
| Learning Management Systems (LMS) | 5 | |
| Human Resources | 4 | |
| School Management Software | 4 | |
| Large-Scale & Standardized Testing | 2 | |
| Admissions, Enrollment & Rostering | 1 | |
| Environmental, Health & Safety (EHS) Compliance | 1 | |
| Other | 1 |